WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCX 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) IntcmaUofial Patent Classification ^ : 

C07H 2iy02, 21/04, C12P 19/34, C12Q 
1/68 



Al 



(11) InternaUonal Publication Number: WO 97/48717 

(43) International Publication Date: 24 December 1997 (24.12.97) 



(21) International Application Number: PCT/ 11597/ 10748 

(22) InternaUonal Filing Date: 18 June 1997 (18.06.97) 



(30) Priority Data: 

08/665.565 



18 June 1996 (18.06.96) 



US 



(71) Applicant: RECOMBINANT BIOCATALYSIS. INC. 

lUS/US]; 505 East Coast BoulevarU South, La Jolla, CA 
92037 (US). 

(72) Inventors: SHORT, Jay. M.; 320 Delagc Drive. Enciniias, 

CA 92024 (US). MATHUR, Eric. J.; 2654 Galicia Way, 
Carlsbad, CA 92009 (US). 

(74) Agent: HAILE. Lisa. A.; Fish & Richardson P.C.. Suite 1400, 
4225 Executive Square. U Jolla. CA 92037 (US). 



(81) Dcsienatcd States: AU, CA. JP. European patent (AT, BE. 
CH. DE. DK, ES, Fl. FR, GB, GR. IE, IT. LU. MC. NL. 
PT.'SE). 



Published 

With international search report. 



(54) TiUe: PRODUCTION AND USE OF NORMALIZED DNA LIBRARIES 
(57) Abstract 

Disclosed is a process for forming a normalized ger»omic DNA libmry from an environmental smple b^^ (a) i^Iatmg a gnomic 
DNA population from the environmental sample; (b) analyzing the complexity of the genomic DNA populat on ° .'"''j^; ^^.f j.^^^^ 
«ie of (0 amplifying the copy number of the DNA population so isolated and (ii) recovering a fraction of the isolated genomic DNA 
Sving a dS Steristicf and (d) nomializing the representation of various DNAs within the genomic ^NA^P»^^?V°" ?™ 
a n^JJializS^rSry of genon;ic DNA from the environmental sample. Also disclosed is a normalized genomic DNA libmry formed from 
an environmental sample by the process. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 
AM 



BE 
BF 
BC 
BJ 
BA 
BY 
CA 
CF 
CG 
CH 

a 

CM 
CN 

cv 
cz 



Albania 

Annenia 

Aiutria 

Auitnlia 

AzcrtMijni 

Bosnia and Hmegoviaa 
B«ri»dos 



Btukina Puo 

Bulgaria 

Benio 

Brazil 

Belanu 

Canada 

CentnJ Africa RqwMic 

Congo 

Switzerland 

Cfltc d'lvotre 

CamcTwm 

China 

Cuba 

Czech Republic 
Gcmiany 
Denniarfc 
Ettonia 



ES 


Spain 


LS 


Lcaotho 


51 


Fl 


Fuiland 


LT 


Lithuania 


SK 


FK 


France 


LU 


Luxembourg 


SN 


CA 


Gabon 


LV 


t^via 


SZ 


GB 


United Kingdom 


MC 


Monaco 


TD 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


GH 


Ghana 


MC 


Madagascar 


TJ 


CN 


Guinea 


MK 


The former Yugottav 


TM 


GR 


Greece 




Republic of Macedonia 


TH 


HU 


Hnngaiy 


ML 


Mali 


TT 


IE 


IreUm) 


MN 


Mongolia 


UA 


IL 


luael 


MR 


Mauritania 


UG 


IS 


Icelind 


MW 


Malawi 


US 


IT 


tuly 


MX 


Mexico 


uz 


JP 


Japan 


NE 


Niger 


VN 


KE 


Kenya 


NL 


Netherlandi 


YU 


KG 


Kyrgyzuan 


NO 


Norway 


ZW 


KP 


Dorociaik People't 


NZ 


New Zeiland 






Republic of Korea 


PL 


Poland 




KK 


Rqwblic of Korea 


PT 


Ponugal 




KZ 


Kazaicstan 


RO 


Romania 




LC 


Saint Lucia 


RU 


Rnuian Federation 




U 


LiechtcntteiQ 


SO 


Sudan 




LK 


Sri LmU 


SB 


Sweden 




LS 


Liberia 


SC 


Singaporr 





Slovenia 

Slovakia 

Senega) 

Swaziliad 

Chad 

Togo 

Tajiktiian 

Turictnentiian 

Trinidad and Tobago 

Ukraine 

U^nda 

United Slates of America 

Uzbekiiian 

Viet Nam 

Yugoslavia 

Zimbabwe 



wo 97/48717 



PCT/US97/10748 



- 1 - 



PRODUCTION AND USE OF NORMALIZED DNA LIBRARIES 

The present invention relates to the field of production and screening of gene libraries, 
and more particularly to the generation and screening of normaJized genomic DNA 
5 libraries from mixed populations of microbes and/or other organisms. 

BACKGROUND OF THE INVENTION 

There has been increasing demand in the research reagent, diagnostic reagent and 
chemical process industries for protein-based catalysts possessing novel capabilities. At 
present, this need is largely addressed using enzymes purified fix)m a variety of cultivated 
10 bacteria or fungi. However, because less than 1% of naturally occurring microbes can 
be grown in pure culture (Amann, 1995), alternative techniques must be developed to 
exploit the full breadth of microbial diversity for potentially valuable new products. 

Virtually all of the commercial enzymes now in use have come from cultured organisms. 
Most of these organisms are bacteria or fungi. Amann et al (Amann, 1995) have 
1 5 estimated cultivated microorganisms in the environment as follows: 



Habitat Culturabilitv (%) 

Seawater 0.00 1-0. 1 

Freshwater 0-25 

Mesotrophic lake 0.01-1.0 

20 Unpol luted esturine waters 0. 1 -3 .0 

Activated sludge 1.0-15 .0 

Sediments 0.25 

Soil 0.3 
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These data were determined from pubhshed information regarding the number of 
cultivated microorganisms derived from the various habitats indicated. 

Other studies have also demonstrated that cultivated organisms comprise only a small 
fraction of the biomass present in the environment. For example, one group of workers 
5 recenUy reported the collection of water and sediment samples from the "Obsidian Pool" 
in Yellowstone National Park (Bams, 1994) where they found cells hybridizmg to 
archaea-specific probes in 55% of 75 enrichmem cultures. Amplification and cloning 
of 16S rRNA encoding sequences revealed mostly unique sequences with little or no 
representation of the organisms which had previously been cultured fix)m this pool, 
10 suggesting the existence of substantial diversity of archaea with so far unknown 
morphological, physiological and biochemical features. Another group performed 
similar studies on the cyanobacterial mat of Octopus Spring in Yellowstone Park and 
came to the same conclusion; namely, tremendous uncultured diversity exists (Ward, 
1990). Giovannoni et ai (1 990) and Torsvik et ai (1 990a) have reported similar results 
1 5 using bacterioplankton collected in the Sargasso Sea and in soil samples, respectively. 
Hiese results indicate that the exclusive use of cultured organisms in screening for useftil 
enzymatic or other bioactivities severely limits the sampling of the potential diversity in 
existence. 

Screening of gene libraries from cultured samples has already proven valuable. It has 
20 recemly been made clear, however, that the use of only cultured organisms for library 
generation limits access to the diversity of nature. The uncultivated organisms present 
in the environment, and/or enzymes or other bioactivities derived thereof, may be useftjj 
in industrial processes. The cultivation of each organism represented in any given 
environmental sample would require significant time and effort. It has been estimated 
25 that in a rich sample of soil, more than 10,000 different species can be present. It is 
apparent that attempting to individually cultivate each of these species would be a 
cumbersome task. Therefore, novel methods of efficiently accessing the diversity present 
in the environment are highly desirable. 
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SIJMMARY OF THE INVENTION 

The present invention addresses this need by providing methods to isolate the DNA from 
a variety of sources, including isolated organisms, consortias of microorganisms, primary 
enrichments, and environmental samples, to make libraries which have been 
5 "normalized" in their representation of the genome populations in the original samples, 
and to screen these libraries for enzyme and other bioactivities. 

The present invention represents a novel, recombinant approach to generate and screen 
DNA libraries constnicted fiom mixed microbial populations of cultivated or, preferably, 
uncultivated (or "environmental") samples. In accordance with the present invention, 

1 0 libraries with equivalent representation of genomes from microbes that can differ vastly 
in abundance in natural populations are generated and screened. This "normalization" 
approach reduces the redundancy of clones from abundant species and increases the 
representation of clones from rare species. These normalized libraries allow for greater 
screening efficiency resulting in the isolation of genes encoding novel biological 

1 5 catalysts. 

Screening of mixed populations of organisms has been made a rational approach because 
of the availability of techniques described herein, whereas previously attempts al 
screening of mixed population were not feasible and were avoided because of the 
cumbersome procedures required. 

20 Thus, in one aspect the invention provides a process for forming a normalized genomic 
DNA library from an environmental sample by (a) isolating a genomic DNA population 
from the environmental sample; (b) analyzing the complexity of the genomic DNA 
population so isolated; (c) at least one of (i) amplifying the copy number of the DNA 
population so isolated and (ii) recovering a fraction of the isolated genomic DNA having 

25 a desired characteristic; and (d) normalizing the representation of various DNAs within 
the genomic DNA population so as to form a normalized library of -genomic DNA from 
the environmental sample. 
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In one preferred embodiment of this aspect, the process comprises the step of recovering 
a fraction of the isolated genomic DNA having a desired characteristic. 

In another preferred embodiment of this aspect, the process comprises the step of 
amplifying the copy number of the DNA population so isolated. 

5 In another prefen^d embodiment of this aspect, the step of amplifying the genomic DNA 
precedes the normalizing step, in an alternate preferred embodiment of this aspect, the 
step of normalizing the genomic DNA precedes the amplifying step. 

In another prefen^d embodiment of this aspect, the process comprises both the steps of 
10 (i) amplifying the copy number of the DNA population so isolated and (ii) recovering a 
fraction of the isolated genomic DNA having a desired characteristic. 

Another aspect of the invention provides a normalized genomic DNA library formed 
from from an environmental sample by a process comprising the steps of (a) isolating a 
genomic DNA population from the environmental sample; (b) analyzing the complexity 

15 of the genomic DNA population so isolated; (c) at least one of (i) amplifying the copy 
number of the DNA population so isolated and (ii) recovering a fraction of the isolated 
genomic DNA having a desired characteristic; and (d) normalizing the representation of 
various DNAs within the genomic DNA population so as to form a normalized library 
of genomic DNA from the environmental sample. The various 

20 preferred embodiments described with respect to the above method aspect of the 
invention are likewise applicable with regard to this aspect of the invention. 

The invention also provides a process for forming a normalized genomic DNA library 
from an environmental sample by (a) isolating a genomic DNA population from the 
environmental sample; (b) analyzing the complexity of the genomic DNA population so 
25 isolated; (c) at least one of (i) amplifying the copy number of the DNA population so 
isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired 
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characteristic; and (d) normalizing the representation of various DNAs within the 
genomic DNA population so as to forni a normalized library of genomic DNA from the 
envirorunental sample. 

Another aspect of the invention provides a normalized genomic DNA libraiy formed 
5 from from an environmental sample by a process comprising the steps of (a) isolating a 
genomic DNA population from the environmental sample; (b) analyzing the complexity 
of the genomic DNA population so isolated; (c) at least one of (i) amplifying the copy 
number of the DNA population so isolated and (ii) recovering a fraction of the isolated 
genomic DNA having a desired characteristic; and (d) normalizing the representation of 
10 various DNAs within the genomic DNA population so as to form a normalized library 
of genomic DNA from the environmental sample. The various preferred embodiments 
described with respect to the above method aspect of the invention are likewise 
applicable with regard to this aspect of the invention. 
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BRIEF DESCRIP TION OF THF DRAWING 

Figure I is a graph showing the percent of total DNA content represented by G + C in 
the various genomic DNA isolates tested as described in Example 2. 

DETAILED DESC RIPTION OF THE INVFNTrn| v 
5 DNA ISOLATION: 

An important step in the generation of a normalized DNA library from an environmental 
sample is the preparation of nucleic acid from the sample. DNA can be isolated from 
samples using various techniques well known in the art {Nucleic Acids in the 
Environment Methods & Applications, J.T. Trevors, D.D. van Elsas, Springer 
10 Laboratory, 1995). Preferably, DNA obtained will be of large size and free of enzyme 
inhibitors and other contaminants. DNA can be isolated directly from the environmental 
sample (direct lysis) or cells may be harvested from the sample prior to DNA recovery 
(cell separation). Direct lysis procedures have several advantages over protocols based 
on cell separation. The direct lysis technique provides more DNA with a generally 
1 5 higher representation of the microbial community, however, it is sometimes smaller in 
size and more likely to contain enzyme inhibitors than DNA recovered using the cell 
separation technique. Very useful direct lysis techniques have recently been described 
which provide DNA of high molecular weight and high purity (Bams, 1994; Holben, 
1994). If inhibitors are present, there are several protocols which utilize cell isolation 
20 which can be employed (Holben, 1994). Additionally, a fractionation technique, such 
as the bis-benzimide separation (cesium chloride isolation) described below, can be used 
to enhance the purity of the DNA. 

ANALYSIS OF COMPLEXITY: 

Analysis of the complexity of the nucleic acid recovered from the environmental samples 
25 can be important to monitor during the isolation and normalization processes. 16SrRNA 
analysis is one technique that can be used to analyze the complexity of the DNA 
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recovered finm environmental samples (Reysenbach, 1992; DcLong, 1992; Bams, 1994). 
Primers have been described for the specific amplification of 16S rRNA genes from each 
of the three described domains. 

FRACTIONATION: 

5 Fractionation of the DNA samples prior to normalization increases the chances of 
cloning DNA from minor species from the pool of organisms sampled. In the present 
invention, DNA is preferably fractionated using a density centrifugation technique. One 
example of such a technique is a cesium-chloride gradient. Preferably, the technique is 
performed in the presence of a nucleic acid intercalating agent which will bind regions 

10 of the DNA and cause a change in the buoyant density of the nucleic acid. More 
preferably, the nucleic acid intercalating agent is a dye, such as bis-benzimide which will 
preferentially bind regions of DNA (AT in the case of bis-benzimide) (Muller, 1975; 
Manuelidis, 1977). When nucleic acid complexed with an intercalating agent, such as 
bis-benzimide, is separated in an appropriate cesium-chloride gradient, the nucleic acid 

1 5 is fractionated. If the intercalating agent preferentially binds regions of the DNA, such 
as GC or AT regions, the nucleic acid is separated based on relative base content in the 
DNA. Nucleic acid from multiple organisms can be separated in this manner. 

Density gradients are currently employed to fractionate nucleic acids. For example, the 
use of bis-benzimide density gradients for the separation of microbial nucleic acids for 

20 use in soil typing and bioremediation has been described. In these experiments, one 
evaluates the relative abundance of Aj^ peaks within fixed benzimide gradients before 
and after remediation treatment to see how the bacterial populations have been affected. 
The technique relies on the premise that on the average, the GC content of a species is 
relatively consistent. This technique is applied in the present invention to fractionate 

25 complex mixtures of genomes. The nucleic acids derived from a sample are subjected 
to ultracentrifugation and fractionated while measuring the Aj^o as in the published 
procedures. 
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In one aspect of the present invention, equal A^^ units are removed from each peak, the 
nucleic acid is amplified using a variety of amplification protocols known in the art, 
including those described hereafter, and gene libraries ar^ prepared. Alternatively, equal 
A260 units arc removed from each peak, and gene libraries are prepared directly from this 
5 nucleic acid. Thus, gene libraries are prepared from a combination of equal amounts of 
DNA from each peak. This strategy enables access to genes from minority organisms 
within environmental samples and enrichments, whose genomes may not be represented 
or may even be lost, due to the fact that the organisms are present in such minor quantity, 
if a library was construed from the total unfractionated DNA sample. Alternatively, 
10 DNA can be normalized subsequent to fractionation, using techniques described 
hereafter. DNA libraries can then be generated from this fractionated/normalized DNA. 

The composition of multiple fractions of the fractionated nucleic acid can be determined 
using PGR related amplification methods of classification well known in the art. 

NORMALIZATION: 

1 5 Previous normalization protocols have been designed for constructing nomialized cDNA 
libraries (WO 95/08647, WO 95/1 1986). These protocols were originally developed for 
the cloning and isolation of rare cDNA's derived from mRNA. The present invention 
relates to the generation of normalized genomic DNA gene libraries from uncultured or 
environmental samples. 

20 Nucleic acid samples isolated directly from environmental samples or from primary 
enrichment cultures will typically contain genomes from a large number of 
microorganisms. These complex communities of organisms can be described by the 
absolute number of species present within a population and by the relative abundance of 
each organisms within the sample. Total nomialization of each organisms within a 

25 sample is very difficult to achieve. Separation techniques such as optical tweezers can 
be used to pick morphologically distinct members with a sample. Cells from each 
member can then be combined in equal numbers or pure cultures of each member within 
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a sample can be prepared and equal numbers of cells from each pure culture combined 
to achieve normalization. In practice, this is very difficult to perform, especially in a 
high thru-put manner. 

The present invention involves the use of techniques to approach normalization of the 
5 genomes present within an environmental sample, generating a DNA library from the 
normalized nucleic acid, and screening the library for an activity of interest. 

In one aspect of the present invention, DNA is isolated from the sample and fractionated. 
The strands of nucleic acid are then melted and allowed to selectively reanneal under 
fixed conditions (Cot driven hybridization). Alternatively, DNA is not fractionated prior 

10 to this melting process. When a mixture of nucleic acid fragments is melted and allowed 
to reanneaJ under stringent conditions, the common sequences find their complementary 
strands faster than the rare sequences. After an optional single-stranded nucleic acid 
isolation step, single-stranded nucleic acid, representing an enrichment of rare sequences, 
is amplified and used to generate gene libraries. This procedure leads to the 

1 5 amplification of rare or low abundance nucleic acid molecules. These molecules are then 
used to generate a library. While all DNA will be recovered, the identification of the 
organism originally containing the DNA may be lost. This method offers the ability to 
recover DNA from "uncionable sources." 

Nucleic acid samples derived using the previously described technique are amplified to 
20 complete the normalization process. For example, samples can be amplified using PGR 
amplification protocols such as those described by Ko et al. (Ko, 1990b; Ko, 1990a, 
Takahashi. 1994), or more preferably, long PGR protocols such as those described by 
Barnes ( 1 994) or Cheng ( 1 994). 
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Normalization can be performed directly, or steps can also be taken to reduce the 
complexity of the nucleic acid pools prior to the normalization process. Such reduction 
in complexity can be beneficial in recovering nucleic acid from the poorly represented 
organisms. 



5 The microorganisms from which the libraries may be prepared include prokaryotic 
microorganisms, such as Eubacteria and Archaebacteria, and lower eukaryotic 
microorganisms such as fungi, some algae and protozoa. The microorganisms may be 
cultured microorganisms or uncultured microorganisms obtained from environmental 
samples and such microorganisms may be extremophiles, such as thermophiles, 
1 0 hyperthermophiles, psychrophiles, psychrotrophs, etc. 



As indicated above, the library may be produced from environmental samples in 
which case DNA may be recovered without culturing of an organism or the DNA ma^ 
be recovered from a cultured organism. 

Sources of microorganism DNA as a starting material library from which target DNA 
1 5 is obtained are particularly contemplated to include environmental samples, such as 
microbial samples obtained from Arctic and Antarctic ice, water or permafrost 
sources, materials of volcanic origin, materials from soil or plant sources in tropical 
areas, etc. Thus, for example, genomic DNA may be recovered from either a 
culturable or non-culturable organism and employed to produce an appropriate 
20 recombinant expression library for subsequent determination of enzyme activity. 

Bacteria and many eukaryotes have a coordinated mechanism for regulating genes 
whose products are involved in related processes. The genes are clustered, in 
structures referred to as "gene clusters," on a single chromosome and are transcribed 
together under the control of a single regulatory sequence, including a single 
25 promoter which initiates transcription of the entire cluster. The gene cluster, the 
promoter, and additional sequences that ftinction in regulation altogether are referred 
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to as an "operon" and can include up to 20 or more genes, usually from 2 to 6 genes. 
Thus, a gene cluster is a group of adjacent genes that are either identical or related, 
usually as to their function. 

Some gene families consist of identical members. Clustering is a prerequisite for 
5 maintaining identity between genes, although clustered genes are not necessarily 
identical. Gene clusters range from extremes where a duplication is generated to 
adjacent related genes to cases where hundreds of identical genes lie in a tandem 
array. Sometimes no significance is discemable in a repetition of a particular gene. 
A principal example of this is the expressed duplicate insulin genes in some species, 
1 0 whereas a single insulin gene is adequate in other mammalian species. 

It is important to further research gene clusters and the extent to which the full length 
of the cluster is necessary for the expression of the proteins resulting therefrom. 
Further, gene clusters undergo continual reorganization and, thus, the ability to create 
heterogeneous libraries of gene clusters from, for example, bacterial or other 

1 5 prokaryote sources is valuable in determining sources of novel proteins, particularly 
including enzymes such as, for example, the polyketide synthases that are responsible 
for the synthesis of polyketides having a vast array of useful activities. Other types of 
proteins that are the product(s) of gene clusters are also contemplated, including, for 
example, antibiotics, antivirals, antitumor agents and regulatory proteins, such as 

20 insulin. 

Polyketides are molecules which are an extremely rich source of bioactivities, 
including antibiotics (such as tetracyclines and erythromycin), anti-cancer agents 
(daunomycin), immunosuppressants (FK506 and rapamycin), and veterinary products 
(monensin). Many polyketides (produced by polyketide synthases) are valuable as 
25 therapeutic agents. Polyketide synthases are multiftjnctional enzymes that catalyze 
the biosynthesis of a hugh variety of carbon chains differing in length and patterns of 
functionality and cyclization. Polyketide synthase genes fall into gene clusters and at 
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least one type (designated type I) of polyketide synthases have large size genes and 
enzymes, complicating genetic manipulation and in vitro studies of these 
genes/proteins. 



The ability to select and combine desired components from a library of polyketides 
and postpolyketide biosynthesis genes for generation of novel polyketides for study is 
appealing. The method(s) of the present invention make it possible to and facilitate 
the cloning of novel polyketide synthases, since one can generate gene banks with 
clones comaining large inserts (especially when using the f-factor based vectors), 
which facilitates cloning of gene clusters. 

Preferably, the gene cluster DNA is ligated into a vector, particularly wherein a vector 
further comprises expression regulatory sequences which can control and regulate the 
production of a detectable protein or protein-related array activity from the ligated 
gene clusters. Use of vectors which have an exceptionally large capacity for 
exogenous DNA introduction are particularly appropriate for use with such gene 
clusters and are described by way of example herein to include the f-factor (or fertility 
factor) of £ coii. This f-factor of £ coli is a plasmid which affect high-frequency 
transfer of itself during conjugation and is ideal to achieve and stably propagate large 
DNA fragments, such as gene clusters from mixed microbial samples. 



LIBRARY SCREENING: 

After normalized libraries have been generated, unique enzymatic activities can be 
discovered using a variety of solid- or liquid-phase screening assays in a variety of 
formats, including a high-throughput robotic formal described herein. The 
normalization of the DNA used to construct the libraries is a key component in the 
process. Nonmalization will increase the representation of DNA from important 
organisms, including those represented in minor amounts in the sample. 
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Exampl^ 1 
DNA Isolation 

Samples are resuspended directly in the following buffer: 

500mMTris-HCl,pH8.0 

lOOmMNaCl 

ImM sodium citrate 

100^g/ml polyadenosine 

5mg/ml lysozyme 
Incubate at 37 *C for 1 hour with occasional agitation. 
Digest with 2mg/ml Proteinase K enzyme (Boehringer Mannheim) 
at37°Cfor30min. 

Add 8 ml of lysis buffer [200 mM Tris-HCl, pH 8.0/100 mM 
NaCl/4% (wt/vol) SDS/10% (wt/vol) 4-aminosalicylate] and mix 
gently by inversion. 

Perform three cycles of freezing in a dry ice-ethanol bath and 
thawing in a 65 °C water bath to release nucleic acids. 
Extract the mixture with phenol and then 
phenol/chloroform/isoamyl alcohol. 

Add 4 grams of acid-washed polyvinylpolypyrrolidone (PVPP) to 
the aqueous phase and incubate 30 minutes at 37° C to remove 
organic contamination. 

Pellet PVPP and filter the supernatant through a 0.45 nm 

membrane to remove residual PVPP. 

Precipitate nucleic acids with isopropyl alcohol. 

Resuspend pellet in 500 ^il TE{10 mM Tris-HCl, pH 8.0/1.0 mM 

EDTA) 
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11. Add 0. 1 g of ammonium acetate and centrifuge mixture at 4 X for 
30 minutes. 

1 2. Precipitate nucleic acids with isopropanol. 

Example 2 

5 Bis-Benzim ide Separation of HNA 

Sample composed of genomic DNA from Clostridium perfringem (27% 
G+C), Escherichia coli (49% G+C) and Micrococcus lysodictium (72% G+C) was 
purified on a cesium-chloride gradient. The cesium chloride (Rf = 1 .3980) solution 
was filtered through a 0,2 urn filter and 15 ml were loaded into a 35 ml OptiSeal tube 
10 (Beckman). The DNA was added and thoroughly mixed. Ten micrograms of bis- 
benzimide (Sigma; Hoechst 33258) were added and mixed thoroughly. The tube was 
then filled with the filtered cesium chloride solution and spun in a VTi50 rotor in a 
Beckman L8-70 Ultracentriflige at 33,000 rpm for 72 hours. Following 
centrifugation, a syringe pump and fractionator (Brandel Model 186) were used to 
1 5 drive the gradient through an ISCO UA-5 UV absorbance detector set to 280 nm. 
Three peaks representing the DNA from the three organisms were obtained. PGR 
amplification of DNA encoding rRNA from a 1 0-fold dilution of the £. coli peak was 
performed with the following primers to amplify eubacterial sequences: 



Forward primer: (27F) 

5 AGAGTTTGATCCTGGCTCAG-S ' 

Reverse primer: (1492R) 

5 '-GGTTACCTTGTTACGACTT-3 ' 
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Example 3 

Sample of DN A obtained from the gill tissue of a clam 
harboring an endosvmbiont which cannot be 
phvsicaHv separated from its host 



1 . Purify DN A on cesium chloride gradient according to published 
protocols (Sambrook, 1989). 

2. Prepare second cesium chloride solution; (Rf = 1 .3980) filter 
through 0.2^m filter and load 1 5ml into a 35ml OptiSeal tube 

10 (Beckman). 

3. Add IOmb bis-benzimide (Sigma; Hoechst 33258) and mix. 

4. Add 50ng purified DNA and mix thoroughly. 

5. Spin in a VTi50 rotor in a Beckman L8-70 Uitracentriftige at 
33,000 rpm for 72 hours. 

15 6. Use syringe pump and fi*actionator (Brandel Model 1 86) to drive 

gradient through an ISCO UA-5 UV absorbance detector set to 
280nm. 

Example 4 
Complexity Analysis 

20 

1 . 1 6S rRN A analysis is used to analyze the complexity of the DNA 
recovered from environmental samples (Reysenbach, 1992; 
DeLong, 1992; Bams, 1994) according to the protocol outlined in 
Example 1. 

25 2. Eubacterial sequences are amplified using the following primers: 

Forward: 

5'-AGAGTTTGATCCTGGCTCAG-3 ' 
Reverse: 
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5 '-GGTTACCTTGTTACGACTT-3 ' 
Archaeal sequences are amplified using the following primers: 
Forward; 
5'- 

5 GCGGATCCGCGGCCGCTGCACA YCTGGTYG ATYCTGCC-3 ' 
Reverse: 

5'-GACGGGCGGTGTGTRCA-3' (R=purine,; Y- 
pyrimidine) 

3 . Amplification reactions proceed as published. The reaction buffer 
1 0 used in the amplification of the archaeal sequences includes 5% 

acetamide (Bams, 1994). 

4. The products of the amplification reactions are rendered blunt 
ended by incubation with Pfu DNA polymerase. 

5. Blunt end ligation into the pCR-Script plasmid in the presence of 
1 5 Srfl restriction endonucleasc according to the manufacturer's 

protocol (Strategene Cloning Systems). 

6. Samples are sequenced using standard sequencing protocols 
(reference) and the number of different sequences present in the 
sample is determined. 

20 Examnle 5 

Normalization 

Purified DNA is fractionated according to the bis-benzimide protocol of Example (2), 
and recovered DNA is sheared or enzymatically digested to 3-6 kb fragments. Lone- 
linker primers are ligated and the DNA is sized selected. Size-selected DNA is 
25 amplified by PGR, if necessary. 

Normalization is then accomplished as follows: 
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1 . Double-stranded DNA sample is resuspended in hybridization 
buffer (0.12 M NaH2P04, pH 6.8/0.82 M NaCl/1 mM EDTA/0.1% 
SDS). 

2. Sample is overlaid with mineral oil and denatured by boiling for 10 
5 minutes. 

3. Sample is incubated at 68 °C for 1 2-36 hours. 

4. Double-stranded DNA is separated from single-stranded DNA 
according to standard protocols (Sambrook, 1989) on 
hydroxyapatite at 60 °C. 

10 5. The single-stranded DNA fraction is desalted and amplified by 

PGR. 

6. The process is repeated for several more rounds (up to 5 or more). 

Example 6 
Library Construction 

15 

1 , Genomic DNA dissolved in TE buffer is vigorously passed through 
a 25 gauge double-hubbed needle until the sheared fragments are in 
the desired size range. 

2. DNA ends are "polished" or blunted with Mung Bean nuclease. 
20 3. EcoRl restriction sites in the target DNA arc protected with EcoBJ 

methylase. 

4. EcoY^l linkers [GGAATTCC] are ligated to the blunted/protected 
DNA using a very high molar ratio of linkers to target DNA. 

5. Linkers are cut back with EcoRl restriction endonuclease and the 
25 DNA is size fractionated using sucrose gradients. 

6. Target DNA is ligated to the XZAPIl vector, packaged using in 
vitro lambda packing extracts, and grown in the appropriate E. coli 
XL! Blue host cell. 
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Examplc 7 
Library Screening; 

The following is a representative example of a procedure for screening an expression 
library prepared in accordance with Example 6. 



5 The general procedures for testing for various chemical characteristics is generally 
applicable to substrates other than those specifically referred to in this Example. 

Screening for Activity. Plates of the library prepared as described in Example 6 are 
used to multiply inoculate a single plate containing 200 ^L of LB Amp/Meth, 
glycerol in each well. This step is performed using the High Density Replicating 

1 0 Tool (HDRT) of the Beckman Biomek with a 1 % bleach, water, isopropanol, air-dry 
sterilization cycle between each inoculation. The single plate is grown for 2h at 37°C 
and is then used to inoculate two white 96-well Dynatech microtiter daughter plates 
containing 250 ^iL of LB Amp/Meth, glycerol in each well. The original single plate 
is incubated at 37°C for 1 8h, then stored at -80°C. The two condensed daughter 

15 plates are incubated at 37 *C also for 1 8 h. The condensed daughter plates are then 
heated at 70 °C for 45 min. to kill the cells and inactivate the host E.coli enzymes. A 
stock solution of 5mg/mL morphourea phenylalanyl-7-amino-4-trifluoromethyl 
coumarin (MuPheAFC, the 'substrate') in DMSO is diluted to 600 \xM with 50 mM 
pH 7.5 Hepes buffer containing 0.6 mg/mL of the detergent dodecyl maltosidc. 



20 MuPheAFC 

Fifty ^iL of the 600 \iM MuPheAFC solution is added to each of the wells of the white 
condensed plates with one 100 ^L mix cycle using the Biomek to yield a final 
concentration of substrate of - 100 fiM. The fluorescence values are recorded 
(excitation = 400 nm, emission = 505 nm) on a plate reading fluorometer 

25 immediately after addition of the substrate (t=0). The plate is incubated at 70°C for 
100 min, then allowed to cool to ambient temperature for 1 5 additional minutes. The 
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fluorescence values are recorded again (t=100)- The values at t=0 are subtracted from 
the values at t=100 to determine if an active clone is present. 

The data will indicate whether one of the clones in a particular well is hydrolyzing the 
substrate. In order to determine the individual clone which canries the activity, the 
5 source library plates are thawed and the individual clones are used to singly inoculate 
a new plate containing LB Amp/Meth, glycerol. As above, the plate is incubated at 
37 °C to grow the cells, heated at 70°C to inactivate the host enzymes, and 50 \xL of 
600 (iM MuPheAFC is added using the Biomek. Additionally three other substrates 
arc tested. They are methyl umbelliferone heptanoate, the CBZ-arginine rhodamine 
10 derivative, and fluoresccin-conjugated casein (-3.2 mol fluorescein per mol of 
casein). 

The umbelliferone and rhodamine are added as 600 stock solutions in 50 of 
Hepes buffer. The fluorescein conjugated casein is also added in 50 nL at a stock 
concentration of 20 and 200 mg/mL. After addition of the substrates the t=0 
1 5 fluorescence values arc recorded, the plate is incubated at 70 X, and the t=l 00 min. 
values are recorded as above. 

These data indicate which plate the active clone is in, where the arginine rhodamine 
derivative is also turned over by this activity, but the lipase substrate, methyl 
umbelliferone heptanoate, and protein, fluoresccin-conjugated casein, do not function 
20 as substrates. 

Chiral amino esters may be determined using at least the following substrates: 

For each substrate which is turned over the enantioselectivity value, E, is determined 

according to the equation below; 

ln[(l-c(l+eep)] 

25 E== 

ln[(l-c(l-ee,)] 
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where eep = the enantiomeric excess (ee) of the hydrolyzed product and c = the 
percent conversion of the reaction. See Wong and Whitesides, Enzymes in Synthetic 
Organic Chemistry, 1994, Elsevier, Tarrytown, New York, pp. 9-12. 

The enantiomeric excess is determined by either chira! high perfomiance liquid 
5 chromatography (HPLC) or chiral capillary electrophoresis (CE). Assays are 

performed as follows: two hundred ^iL of the appropriate buffer is added to each well 
of a 96-well white microliter plate, followed by 50 mL of partially or completely 
purified enzyme solution; 50 pLof substrate is added and the increase in fluorescence 
monitored versus time until 50% of the substrate is consumed or the reaction stops, 
1 0 whichever comes first. 
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Example 8 

Construction of a Stable. Large Insert Picoplankton Genomic DNA Library 

Cell collection and preparation of DNA. Agarose plugs containing concentrated 
picoplankton cells were prepared from samples collected on an oceanographic cruise 
5 from Newport, Oregon to Honolulu, Hawaii. Seawater (30 liters) was collected in 
Niskin bottles, screened through 10 ^m Nitex, and concentrated by hollow fiber 
filtration (Amicon DCIO) through 30,000 MW cutoff polyfulfone filters. The 
concentrated bacterioplankton cells were collected on a 0.22 |im, 47 mm Durapore 
filter, and resuspended in 1 ml of 2X STE buffer ( 1 M NaC 1 , 0. 1 M EDTA, 1 0 mM 

10 Tris, pH 8,0) to a final density of approximately 1x10'° cells per ml. The cell 
suspension was mixed with one volume of 1 % molten Seaplaque LMP agarose 
(FMC) cooled to 40°C, and then immediately drawn into a 1 ml syringe. The syringe 
was sealed with parafilm and placed on ice for 10 min. The cell-containing agarose 
plug was extruded into 10 ml of Lysis Buffer (lOmM Tris pH 8.0, 50 mM NaCl, 

15 O.IM EDTA, 1% Sarkosyl, 0.2% sodium deoxycholate, 1 mg/ml lysozymc) and 
incubated at 37° C for one hour. The agarose plug was then transferred to 40 mis of 
ESP Buffer (1% Sarkosyl, 1 mg/ml proteinase K, in 0.5M EDTA), and incubated at 
55 °C for 16 hours. The solution was decanted and replaced with ft^sh ESP Buffer, 
and incubated at 55 °C for an additional hour. The agarose plugs were then placed in 

20 50 mM EDTA and stored at 4*'C shipboard for the duration of the oceanographic 
cruise. 

One slice of an agarose plug (72 nl) prepared from a sample collected off the Oregon 
coast was dialyzed overnight at 4°C against 1 mL of buffer A (lOOmM NaCl, lOmM 
Bis Tris Propane-HCl, 100 ^g/ml acetylated BSA: pH 7.0 @ 25 °C) in a 2 mL 
25 microcentrifuge tube. The solution was replaced with 250 |il of fresh buffer A 

containing 10 mM MgCK and 1 mM DTT and incubated on a rocking platform for 1 
hr at room temperature. The solution was then changed to 250 |j.l of the same buffer 
containing 4U of Sau3Al (NEB), equilibrated to 37^*0 in a water bath, and then 
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incubated on a rocking platform in a 37''C incubator for 45 min. The plug was 
transferred to a 1 .5 mi microcentrifuge tube and incubated at 68 °C for 30 min to 
inactivate the enzyme and to meh the agarose. The agarose was digested and the 
DNA dephosphorylased using Geiase and HK-phosphatase (Epicentre), respectively, 
5 according to the manufacturer's recommendations. Protein was removed by gentle 
phenol/chloroform extraction and the DNA was ethanol precipitated, pelleted, and 
then washed with 70% ethanol. This partially digested DNA was resuspended in 
sterile HjO to a concentration of 2.5 ng/pl for ligation to the pFOSl vector 

PGR amplification results from several of the agarose plugs (data not shown) 
1 0 indicated the presence of significant amounts of archaeal DNA. Quantitative 

hybridization experiments using rRN A extracted from one sample, collected at 200 m 
of depth off the Oregon Coast, indicated that planktonic archaea in (this assemblage 
comprised approximately 4,7% of the total picoplankton biomass (this sample 
corresponds to "PACr'-200 m in Table 1 of DeLong et ai, high abundance of 
15 Archaea in Antarctic marine picoplankton, Nature, i7/:695-698, 1994). Results from 
archaeal-biased rDNA PGR amplification perfonmed on agarose plug lysates 
confirmed the presence of relatively large amounts of archaeal DNA in this sample. 
Agarose plugs prepared from this picoplankton sample were chosen for subsequent 
fosmid library preparation. Each 1 ml agarose plug from this site contained 
20 approximately 7.5 x 1 0^ cells, therefore approximately 5.4 x 1 0^ cells were present in 
the 72 |il slice used in the preparation of the partially digested DNA. 

Vector arms were prepared from pFOSl as described (Kim et al. Stable propagation 
of casmid sized human DNA inserts in an F factor based vector, Nucl. Acids Res., 
20: 1 0832-10835, 1992). Briefly, the plasmid was completely digested with Astll, 
25 dephosphorylated with HK phosphatase, and then digested with BamHI to generate 
two arms, each of which contained a cos site in the proper orientation for cloning and 
packaging ligated DNA between 35-45 kbp. The partially digested picoplankton 
DNA was ligated overnight to the PFOSl arms in a 1 5 ^1 ligation reaction containing 
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25 ng each of vector and insert and lU of T4 DNA ligase (Boehringer-Mannlieim). 
The ligated DNA in four microliters of this reaction was in vitro packaged using the 
Gigapack XL packaging system (Stratagene), the fosmid particles transfected to E. 
coli strain DHIOB (BRL), and the cells spread onto LB,^,5 plates. The resultant 
5 fosmid clones were picked into 96-well microliter dishes containing LB^^,5 

supplemented with 7% glycerol. Recombinant fosmids, each containing ca. 40 kb of 
picoplankton DNA insert, yielded a library of 3.552 fosmid clones, containing 
approximately 1 .4 x 1 0^ base pairs of cloned DNA. All of the clones examined 
contained inserts ranging from 38 to 42 kbp. This library was stored frozen at -80 °C 
10 for later analysis. 

Numerous modifications and variations of the present invention are possible in light 
of the above teachings; therefore, within the scope of the claims, the invention may be 
practiced other than as particularly described. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT: Recombinant Biocatalysis , Inc. 



(ii) TITLE OF INVENTION: PRODUCTION AND USE OF NORMALIZED DNA 

LIBRARIES 

(iii) NUMBER OF SEQUENCES: 10 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: FISH & RICHARDSON 

(B) STREET: 4225 EXECUTIVE SQUARE, STE . 14 00 

(C) CITY: LA JOLLA 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 92037 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.5 INCH DISKETTE 

(B) COMPUTER: IBM PS/2 

(C) OPERATING SYSTEM: MS-DOS 

(D) SOFTWARE: WORD PERFECT 6.0 



(vi) CURRENT APPLICATION DATA: 

{A) APPLICATION NUMBER: Unassigned 

(B) FILING DATE: 18 June 1997 

(C) CLASSIFICATION: Unas signed 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/665.565 

(B) FILING DATE: 18 June 1996 

(C) CLASSIFICATION: 

(viii) • ATTORNEY/AGENT INFORMATION: 

(A) NAME: LISA A. HAILE, Ph.D. 

(B) REGISTRATION NUMBER: 38,347 

(C) REFERENCE /DOCKET NUMBER: 09010/019WO1 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 619-678-5070 

(B) TELEFAX: 619-678-5099 
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(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGATTGAA GACCCTATGG AC 

(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
CGGAAGATCT TTAAGCACTT CTCTCAGGTT C 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGGACAGG CTTGAAAAAG TA 

(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CGGAAGATCT TCAGCTAAGC TTCTCTAAGA A 



52 



31 



52 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

{ii) MOLECULE TYPE: cDNA 

txi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGTGGGAA TTAGACCCTA AA 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CGGAGGATCC CTACACCTGT TTTTCAAGCT C 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS 
{A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGACATAC TTAATGAACA AT 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGGAAGATCT TTATGAGAAG TCCCTTTCAA G 
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(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE .CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGCGGAAA CTGGCCGAGC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CJiARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS; SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CGGAGGATCC TTAAAGTGCC GCTTCGATCA A 
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TABLE 1 



A2 



Flu<'feiC£inc.m|uf3tcdCB!Eein (3,2 m^l flu-retMi 

CBZ-Aii-AMC 

|.BOC-Ali-Ali-Aip-AMC 

succinyl-AUGIyI.eu-AMC 

CBZ-A/gAMC 

CBZ-Mei-AMC 

moq)hourea-Phe ■ AMC 

t-BOC «: l-buioiy ctrbonyl. CBZ >= ctrhonyl ben; 
AMC = 7.amtoo-4 -inelhyt coumirin 



Ruorcscein conjuf iied casein 

t.DOC- AU.AU-A«p-AFC 
CBZ- Ala-AI*-Lyi-AFC 
(uccinvt-AJiAii-Phe-AFC 
(occinyl-AI^-Cly-Leu-AFC 

AFC » 7-»mint>-4-trinooTomethyl coumirin.) 




O 



AH3 



AA3 



AB3 



AC3 




Fluorescein oonjuf iied 
CJsein 



lUCcinyl-AU-AUI'he.AFC 

CBZriic.AFC 

CBZTip.AKC 




AI3 



luccinyl-Ala (ily l.eu AhC* 

CBZ-Ala-AK. 

C07.-?;e*'AK- 



AG3 

CBZ AI» AJi-Lyi AFC 
CB7A;s-AFC 
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TABLE 2 

L2 



^ ^'^^ LB 3 



LC3 



LD3 



LE3 

And all of L2 

CIS 
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TABLE 3 

LI 1.1 ^ LI3 ^ 



And allofL2 



U3 



CH, 

LIG »-L3 



TXOO 



LN3 

L03 




Ph44jOO, 



Y 

O 
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TABLE 4 



Ajl 



4-mcthyl umbel iiferone 
wherein R = 



G2 0-D-gaiactase 

0-D-glucosc 

^-D-gtucuronide 
GB3 ^-D-ccllotrioside 

^-B-ceilobiopyranoside 
GC3 ^-D-galactosc 

a-D-galactose 
GD3 )3-D-glucosc 

a-D-glucose 
GE3 ^-£>-glucuronide 
GI3 /5-D-N,N-diacctylchiiobiosc 
GJ3 ^-D-fucose 

of-L-fucosc 

P-L-fucosc 
GK3 p-D-mannosc 

a-D-maimose 



non-U mbcllifcryl substrates 

GA3 amylosc [polyglucan al.4 linkages], amylopectin 

[polyglucan branching al.6 linkages] 
GF3 xylan [poly 1 .4-D-xylan] 

GG3 amylopectin, puUuian 

GH3 sucrose, fruciofuranoside 
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What Is Claimed Is: 

1 . A process for producing a normalized genomic DNA library from an 
environmental sample, which comprises the steps of: 

(a) isolating a genomic DNA population from the environmental sample; 
5 (b) analyzing the complexity of the genomic DNA population so isolated; 

(c) at least one of the steps selected from the group consisting of (i) amplifying 
the copy number of the DNA population so isolated and (ii) recovering a fraction of 
the isolated genomic DNA having a desired characteristic; and 

(d) normalizing the representation of various DNAs within the genomic DNA 
1 0 population so as to form a normalized library of genomic DNA from the 

envirormiental sample. 

2. The process of claim 1 which comprises the step of recovering a fraction of 
the isolated genomic DNA having a desired characteristic. 

3. The process of claim 1 which comprises the step of amplifying the copy 
1 5 number of the DNA population so isolated. 

4. The process of claim 1 wherein the step of amplifying the genomic DNA 
precedes the normalizing step. 

5. The process of claim 1 wherein the step of normalizing the genomic DNA 
precedes the amplifying step. 

20 

6. The process of claim I which comprises both the steps of (i) amplifying the 
copy number of the DNA population so isolated and (ii) recovering a fraction of the 
isolated genomic DNA having a desired characteristic 

7. A normalized genomic DNA library formed fix)m from an -environmental 
25 sample by a process comprising the steps of: 
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(a) isolating a genomic DNA population from the environmental sample; 

(b) analyzing the complexity of the genomic DNA population so isolated; 

(c) at least one of (i) amplifying the copy number of the DNA population so 
isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired 

5 characteristic; and 

(d) normalizing the representation of various DN As within the genomic DNA 
population so as to form a normalized library of genomic DNA from the 
environmental sample. 

8. The library of claim I wherein the process of forming said library comprises 
1 0 the step of recovering a fraction of the isolated genomic DNA having a desired 

characteristic. 

9. The library of claim 1 wherein the process of forming said library comprises 
the step of amplifying the copy number of the DNA population so isolated. 

10. The library of claim 1 wherein in the process of fonning said library the 
1 5 step of amplifying the genomic DNA precedes the normalizing step. 

1 1 . The library of claim 1 wherein in the process of forming said library the 
step of normalizing the genomic DNA precedes the amplifying step. 

12. The library of claim 1 wherein the process of fomiing said library 

20 comprises both the steps of (i) amplifying the copy number of the DNA population so 
isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired 
characteristic. 

1 3. A process for forming a normalized library of genomic gene clusters from 
an enviroiunental sample which comprises 

25 (a) isolating a genomic DNA population from the environmental sample; 
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(b) analyzing the complexity of the genomic DNA population so isolated; 

(c) at least one of (i) amplifying the copy number of the DNA population so 
isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired 
characteristic; and 

5 (d) normalizing the representation of various DNAs within the genomic DNA 

population so as to form a normalized library of genomic DNA from the 
environmental sample. 

14. A normalized library of genomic gene clusters formed from from an 
environmental sample by a process comprising the steps of 
1 0 (a) isolating a genomic DNA population from the enviroimiental sample; 

(b) analyzing the complexity of the genomic DNA population so isolated; 

(c) at least one of (i) amplifying the copy number of the DNA population so 
isolated and (ii) recovering a fraction of the isolated genomic DNA having a desired 
characteristic; and 

1 5 (d) normalizing the representation of various DNAs within the genomic DNA 

population so as to form a noranalized library of genomic DNA from the 
envirormiental sample. 
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